Adaptive Query-Based Sampling of Distributed Collections

نویسندگان

  • Mark Baillie
  • Leif Azzopardi
  • Fabio Crestani
چکیده

As part of a Distributed Information Retrieval system a description of each remote information resource, archive or repository is usually stored centrally in order to facilitate resource selection. The acquisition of precise resource descriptions is therefore an important phase in Distributed Information Retrieval, as the quality of such representations will impact on selection accuracy, and ultimately retrieval performance. While Query-Based Sampling is currently used for content discovery of uncooperative resources, the application of this technique is dependent upon heuristic guidelines to determine when a sufficiently accurate representation of each remote resource has been obtained. In this paper we address this shortcoming by using the Predictive Likelihood to provide both an indication of the quality of an acquired resource description estimate, and when a sufficiently good representation of a resource has been obtained during Query-Based Sampling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques

In the context of information retrieval, traditional collection selection algorithms have been widely studied. These algorithms utilize language models, a representation of the contents of each text collection over which selection is to be performed, but these language models cannot always be easily acquired. Query-based sampling is a technique by which these language models are discovered by i...

متن کامل

Sample Sizes for Query Probing in Uncooperative Distributed Information Retrieval

The goal of distributed information retrieval is to support effective searching over multiple document collections. For efficiency, queries should be routed to only those collections that are likely to contain relevant documents, so it is necessary to first obtain information about the content of the target collections. In an uncooperative environment, query probing — where randomly-chosen quer...

متن کامل

The Eeects of Query-based Sampling on Automatic Database Selection Algorithms Keywords: Distributed Collections, Merging Search Results/information Synthesis, Database Selection

Database selection algorithms need to know the subject areas covered by each text database, but this metadata can be diicult to acquire in multi-party environments, such as the Internet, where each party has diierent interests and capabilities. Query-based sampling is a relatively new technique in which metadata is inferred by interacting with each text database and observing the outcomes. Quer...

متن کامل

Query-driven Adaptive Term Set Search in large Peer-to- peer Textual Collections

Most of the search mechanisms which include in Distributed Hash Table based Peer-to-peer system depends on multiple single keyword-based search operations. This increases the traffic cost and has a poor accuracy. Pre-computing the term-set-based index can reduce the cost but needs exponentially growing index size. Based on the observations made, queries are usually short and the users have limi...

متن کامل

A Clustered Index Approach to Distributed XPath

Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006